Re: looking for a faster way to do that - Mailing list pgsql-general

From Eduardo Morras
Subject Re: looking for a faster way to do that
Date
Msg-id 4E5EF2B00061C2EC@> (added by postmaster@resmaa13.ono.com)
Whole thread Raw
In response to Re: looking for a faster way to do that  (hamann.w@t-online.de)
List pgsql-general
At 14:12 23/09/2011, hamann.w@t-online.de wrote:
>Eduardo Morras wrote:
>
> >> You can try these, i doubt they will use any index but its a
> >> different approach:
> >>
> >> select * from items where
> length(items.code)<>length(rtrim(items.code,'ABC'));
> >>
> >> select * from items where strpos(items.code,'ABC')=0 or
> >> strpos(items.code,'any_substring')=0;
>
>Hi,
>
>if I understand this right, it does not mean "check if the string
>appears at position 0"
>which could translate into an index query, but rather "check if the
>string appears anywhere
>and then check if that is position 0", so the entire table is checked.

The second one yes, as it checks all patterns you want only one time
per row they only needs one table scan. The first one eliminates the
substring 'ABC' from the string, if the lengths of both strings are
equal, the substring 'ABC' wasn't in it. If they are different, the
trimmed string will be shorter.

>explain analyze select items.num, wantcode from items, n where
>strpos(code, wantcode) = 0;
>  Nested Loop  (cost=167.14..196066.54 rows=39178 width=36) (actual
> time=0.074..36639.312 rows=7832539 loops=1)
>    Join Filter: (strpos(("inner".code)::text, "outer".wantcode) = 0)
>    ->  Seq Scan on n  (cost=0.00..14.15 rows=815 width=32) (actual
> time=0.005..2.212 rows=815 loops=1)
>    ->  Materialize  (cost=167.14..263.28 rows=9614 width=42)
> (actual time=0.007..13.970 rows=9614 loops=815)
>          ->  Seq Scan on items  (cost=0.00..167.14 rows=9614
> width=42) (actual time=0.044..14.855 rows=9614 loops=1)
>  Total runtime: 46229.836 ms
>
>
>The query ran much faster than the pattern query, however. This
>seems to be the performance
>of just searching for a plain string vs. initializing the regex
>engine every time (for 815
>queries in a test set)

It will do only one table scan while your original code will do one
for each substring you want to test. You can add more and more
substrings without too much cost. If you want to use the regex engine
instead the postgresql string funtions check the regexp_matches(), it
should be faster if you have 3000 substrings.

select * from items where regexp_matches(items.code,'(ABC) (DE1)
(any_substring)')<>{};

>Regards
>Wolfgang Hamann

HTH



pgsql-general by date:

Previous
From: Guillaume Lelarge
Date:
Subject: Re: Mac OS X shared_buffers not same as postgresql.conf file
Next
From: Albretch Mueller
Date:
Subject: (another ;-)) PostgreSQL-derived project ...